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Abstract In this paper, we present near-optimal space bounds for Lp-samplers. Given a 
stream of updates (additions and subtraction) to the coordinates of an underlying vector 
X G R", a perfect Lp sampler outputs the i-th coordinate with probability |a;i|''/||a;||p. In 
SODA 2010, Monemizadeh and Woodruff showed polylog space upper bounds for approx- 
imate Lp-samplers and demonstrated various applications of them. Very recently, Andoni, 
Krauthgamer and Onak improved the upper bounds and gave a 0(e~^ log^ n) space e relative 
error and constant failure rate Lp-sampler for p £ [1,2]. In this work, we give another such 
algorithm requiring only 0{e~^ log^ n) space for p G (1, 2). For p G (0, 1), our space bound 
is 0{e~^ log^ n), while for the p — 1 case we have an 0(log(l/e)e~^ log^ n) space algorithm. 
We also give a 0(log^ n) bits zero relative error Lo-sampler, improving the 0(log^ n) bits 
algorithm due to Frahling, Indyk and Sohler. 

As an application of our samplers, we give better upper bounds for the problem of finding 
duplicates in data streams. In case the length of the stream is longer than the alphabet size, 
Li sampling gives us an 0(log^ n) space algorithm, thus improving the previous 0(log^ n) 
bound due to Gopalan and Radhakrishnan. 

In the second part of our work, we prove an f2(log^ n) lower bound for sampling from 
0, ±1 vectors (in this special case, the parameter p is not relevant for Lp sampling). This 
matches the space of our sampling algorithms for constant e > 0. We also prove tight space 
lower bounds for the finding duplicates and heavy hitters problems. We obtain these lower 
bounds using reductions from the communication complexity problem augmented indexing. 



1 Introduction 



Sampling has become an indispensable tool in analysing massive data sets, and particularly in processing 
data streams. In the past decade, several sampling techniques have been proposed and studied for the 
data stream model [31 Illl lSl 1101 1231 [T]. In this work, we study Lp-samplers, a new variant of space efficient 
samplers for data streams that was introduced by Monemizadeh and Woodruff in [23] . Roughly speaking, 
given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x G R", 
an Lp-sampler processes the stream and outputs a sample coordinate of x where the i-th coordinate is 
picked with probability proportional to \xi\''. 

In [23], it was observed that Lp-samplers lead to alternative algorithms for many known streaming 
problems, including heavy hitters and frequency moment estimation. Here in this paper, we focus on a 
specific application, namely finding duplicates in long streams; although our Lp samplers work and often 
give better space performance for all applications listed in [23]. We refer the reader to [23] and [1] for 
further applications of Lp-samplers. 

Observe that we allow both negative and positive updates to the coordinates of the underlying vector. 
In the case where only positive updates are allowed and p = 1, the problem is well understood. For 
instance the classical reservoir sampling [20] from the 60's (attributed to Alan G. Waterman) gives a 
simple solution as follows. Given a pair {i,u), indicating an addition of it to the i-th coordinate of the 
underlying vector x, the sampler having maintained s, the sum of the updates seen so far, replaces the 
current sample with i with probability u/s, otherwise does nothing and moves to the next update. It is 
easy to verify that this is a perfect Li-sampler and the space usage is only 0(1) words. 

With the presence of negative updates, sampling becomes a non-trivial problem. In this case, it is not 
clear at all how to maintain samples without keeping track of the updates to the individual coordinates. 
In fact, the question regarding the mere existence of such samplers was raised few years ago by Cormode, 
Muthukrishnan, and Rozenbaum in [^. Last year in SODA 2010, Monemizadeh and Woodruff |23) 
answered this question affirmatively, however in an approximate sense. Before stating their results we 
give a formal definition of Lp-samplers. 

Definition 1. Let x £ R" be a non-zero vector. For p > we call the Lp distribution corresponding to 
X the distribution on [n] that takes i with probability 

Ml 

ii^iir 

with \\x\\p = (X]r=i For p = 0, the Lq distribution corresponding to x is the uniform distribution 

over the non-zero coordinates of x. 

We call a streaming algorithm a perfect Lp-sampler if it outputs an index according to this distribution 
and fails only if x is the zero vector. An approximate Lp-sampler may fail but the distribution of its 
output should be close to the Lp distribution. In particular, we speak of an e relative error Lp-sampler 
if, conditioned on no failure, it outputs the index i with probability (1 ± e)|3::i|''/||2;||p + 0{n~'^), where 
c is an arbitrary constant. For p = the corresponding formula is (1 ± e)/fc + 0{n^''), where k is the 
number of non-zero coordinates in x. Unless stated otherwise we assume that the failure probability is 
at most 1/2. 

In this definition one can consider c to be 2, but all existing constructions of Lp-samplers work for 
an arbitrary c with just a constant factor increase in the space, so we will not specify c in the following 
and ignore errors of probability n~'^. 

Previous worli. A zero relative error Lo-sampler which uses 0(log='n) bits was shown in [12j. In [23| . 
the authors gave an e relative error Lp-sampler for p £ [0, 2] which uses poly(e~'^, logn) space. They also 
showed a 2-pass O(polylogn) space zero relative error Lp-sampler for any p £ [0, 2]. In addition to these, 
they demonstrated that Lp-samplers can be used as a black-box to obtain streaming algorithms for other 
problems such as Lp estimation (for p > 2), heavy hitters, and cascaded norms [15] • Unfortunately, due 
to the large exponents in their bounds, the Lp-samplers given there do not lead to efficient solutions for 
the aforementioned applications. 

Very recently, Andoni, Krauthgamer and Onak in T improved the results of [23] considerably. 
Through the adaptation of a generic and simple method, named precision sampling, they managed to 
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bring down the space upper bounds to log^ n) bits for e relative error Lp-samplers for p € [1,2]. 

Roughly speaking, the idea of precision sampling is to scale the input vector with random coefficients so 
that the i-th coordinate becomes the maximum with probability roughly proportional to \xi\^. Moreover 
the maximum (heavy) coordinate is found through a small-space heavy hitter algorithm. In more de- 
tail, the input vector {xi, . . . , x„) is scaled by random coefficients . . . , t~^), where each ti is picked 
uniformly at random from [0,1]. Let z = {xit^^ , . . . ,x„t~^) be the scaled vector. Here the impor- 
tant observation is Pr[t~^ ^ t] ~ 1/i and hence, for instance, by replacing t with we get 
Pr[|zi| > ||a:||i] = |a:i|/||a;||i. (In the same manner, one can scale Xi by instead of and get a 
similar result for general p.) It turns out, we only need to we have a constant approximation to ||a;||i 
and look for a coordinate in z that has reached a limit of 0(||a;||i). On the other hand it is shown 
that the heaviest coordinate in z has a weight of r2(log~^ n)||0||i (with constant probability), and thus a 
small-space heavy hitter computation can be used to find the maximum. In particular, the Lp-sampler 
of [1] adapts the popular count-sketch scheme [6] for this purpose. 

Our contributions. In this paper, we give Lp-samplers requiring only 0{e~^ log^ n) space for p £ (1, 2). 
For p G (0, 1), our space bound is 0(e~^ log^ n), while for the p — 1 case we have an 0(log(l/e)e~'^ log'^ n) 
space algorithm. In essence, our sampler follows the basic structure of the precision sampling method 
explained above. However compared to [1], we do a sharper analysis of the error terms in the count-sketch, 
and through additional ideas, we manage to get rid of a log factor and preserve the previous dependence 
on e. Roughly speaking, we use the fact that the error term in the count-sketch is bounded by the L2 
norm of the tail distribution of z (the heavy coordinates do not contribute) . On the other hand, taking 
the distribution of the random coefficients into account, we bound this by 0(||a;||p), which enables us to 
save a log factor. Additionally, to preserve the dependence on e, we have to use a slightly more powerful 
source of randomness for choosing the scaling factors (in contrast with the pairwise-independence of [1]), 
and take care of some subtle issues regarding the conditioning on the error terms which are not handled 
in the previous work (Lemma [JlQ 

As p approaches zero, precision sampling becomes very inefficient, as the random coefficients t~^^^ 
tend to infinity. For the p = case, we present a completely different algorithm. Briefly, our Lo-sampler 
tries to detect a non-zero coordinate by picking random subsets of [n]. The non-zero coordinates are 
found by an exact sparse recovery procedure and Nisan's PRG [2S] is applied to decrease the random- 
ness involved. Our 0(log'^ n) space bound compares favorably to the previous algorithms, which use 
respectively 0(log''n) space [12] and poly (log n, £~^) space [23] (the latter one gives only e-relative error 
sampling) . 

In Section |4l we prove that sampling from 0, ±1 vectors requires f2(log^ n) space, by a reduction 
from the communication complexity problem augmented indexing. In this special case p is not relevant 
for Lp-sampling, hence this shows that our Lo-sampling algorithm uses the optimal space up to constant 
factors, and our Lp-sampler for p G (0, 2) has the optimal space (up to constant factors) for e > a 
constant. 

Given a stream of length n + 1 over the alphabet [n] , finding duplicates problem asks to output some 
a £ [n] that has appeared at least twice in the stream. Observe that by the pigeon-hole principle, such 
a always exists. Prior to our work, the best upper bound for finding duplicates was due to Gopalan 
and Radhakrishnan [13], who gave a one-pass 0(log'' n) bits randomized algorithm with constant failure 
rate. Here we settle the one-pass complexity of this problem by giving an 0(log^ n) space algorithm via 
a direct application of our Li sampler, and by giving an Q,{log^ n) lower bound afterwards. Combined 
with a sparse recovery procedure, our solution also generalizes to a near-optimal 0(log^ n-\- s log n) space 
algorithm for finding duplicates in streams of length n — s, improving on the 0{s log'' n) result of [14] . 

Finally, we prove lower bounds for the problem of finding heavy hitters in update streams, which 
is closely related to the Lp-sampling problem. This lower bound is also obtained by a reduction from 
the augment indexing and proves that any Lp heavy hitters algorithm (defined in Section [4. 4p must use 
f2(^ log^ n) space, even in the strict turnstile model. Our lower bound essentially matches the known 
upper bounds [S] [Hj HH] which work in the general update model. 

^ Further we note that our algorithm not only produces a sample i from the Lp distribution, but also approxi- 
mates Xi . Similar approximation is also produced by the Lp sampler of [l] , but they claim to give an approximation 
of Ixil''/! |x| lp. However, this claim for p < 2 cannot hold as it would contradict with the Q{t~-^) space lower bound 
for estimating Hamming distance. 
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Related work. In O [5], the authors have studied samphng from sliding windows, and the recent 
paper of Cormode et al. [TD] generalizes the classical reservoir sampling to distributed streams. These 
works only support insertion streams. The basic idea of random scaling used in [1] and in our paper has 
appeared earlier in the priority sampling technique ^ , where the focus is to estimate the weight of 
certain subsets of a vector, defined by a sequence of positive updates. 

Finding duplicates in streams was first considered in the context of detecting fraud in click streams 
[21j . Muthukrishnan in [24] asked whether this problem can be solved in O(polylogn) space using a 
constant number of passes. In [23, Tarui showed that any fc-pass deterministic algorithm must use 
^(n^fc-i ) space. 

Heavy hitter algorithms have been studied extensively. The work of Berinde et al. [4] gives tight 
lower bounds for heavy hitters under insertion-only streams. We are not aware of similar works on general 
update streams, although the recent works of [3[5S], where the authors show lower bounds for respectively 
approximate sparse recovery, and Johnson-Lindenstrauss transforms (via augmented indexing) is closely 
related. 

Notation. We write [n] for the set {1, . . . , n}. An update stream is a sequence of tuples (i, u), where 

1 G [n] and u G R. The stream of updates implicitly define an n-dimensional vector x G R" as follows. 
Initially, x is the zero vector. An update of the form (i, u) adds u to the coordinate Xi of x (leaving the 
other coordinates unchanged). In the strict turnstile model we are guaranteed that all coordinates of x 
are non-negative at the end of the stream (although negative updates are still allowed), in the general 
model such guarantee does not exist. Our algorithms (like most other algorithms in the literature) work 
by maintaining a linear sketch L : R" R™. When computing the space requirement of such a streaming 
algorithm, we assume all the updates are integers {u G Z) and the coordinates of the vector x throughout 
the stream remain bounded by some value Af = poly(n). We make sure that the matrix of L has also 
polynomially bounded integer entries, this way maintaining L{x) requires updating m integer counters 
and requires 0(m log n) bits with fast update time (especially since the matrices we consider are sparse). 
This discretization step is standard and thus we omit most details. 

In the standard model for randomized streaming algorithms the random bits used (to generate the 
random linear map L, for example) are part of the space bound. In contrast, our lower bounds do not 
make any assumption on the working of the streaming algorithm and allow for the random oracle model, 
where the algorithm is allowed free access to a random string at any time. All lower bounds are proved 
through reductions from communication problems. 

We say an event happens with low probability if the probability can be made less than n'". Here 
c > is an arbitrary constant, for example one can set c = 2. The actual value of c has limited effect 
on the space of our algorithm: it changes only the unspecified constants hidden in the O notation. We 
will routinely ignore low probability events, sometime even 0(n) of them, which is okay as we leave c 
unspecified. Events complementary to low probability events are referred to as high probability events. 

For < m < n we call the vector x G m-sparse if all but at most m coordinates of x are zero. We 
define Err2'(a;) = min \\x — x\\2, where x G R" ranges over all the m-sparse vectors. 

2 The Lp Sampler 

In this section, we present our Lp sampler algorithm. In the following, we assume p G (0,2). This 
particular method does not seem to be applicable for the p = 2 case and we know of no 0(log^ n) space 
L2-sampling algorithm. We treat the p = case separately later. 

We start by stating the properties of the two streaming algorithms we are going to use. Both are 
based on maintaining L{x) for a well chosen random linear map L : R" R" with n' < n. 

The count-sketch algorithm [6] is so simple we cannot resist the temptation to define it here. For 
parameter m, the count-sketch algorithm works as follows. It selects independent samples hj : [n] [6m] 
and Qj : [n] — > {1, —1} from pairwise independent uniform hash families for j G [I] and I — O(logn). It 
computes the following linear function of x for j G [I] and k G [6m]: ykj ~ X]ig[ni h (i)^k 9ji^)^i- Finally 
it outputs X* G R" as an approximation of x with x* = median(gij(i)j/h(i) : j G [I]) for i G [n]. 

The performance guarantee of the count-sketch algorithm is as follows. (For a compact proof see a 
recent survey by Gilbert and Indyk [13].) 
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Initialization Stage: 

1. For < p < 2, p 7^ 1 set fc = W\l/\p - lH and m = ^''''CO.p-i)) ^jth a large 
enough constant factor. 

2. For p = 1 set k = m — 0(log(l/e)) with a large enough constant factor. 

3. Set (3 = fi-i/p and / = O(logn) with a large enough constant factor. 

4. Select fc-wise independent uniform scaling factors ti £ [0, 1] for i £ [n]. 

5. Select the appropriate random linear functions for the execution of the count-sketch 
algorithm and L and L' for the norm estimations in the processing stage. 

Processing Stage: 

1. Use count-sketch with parameter m for the scaled vector z £ R" with Zi = Xi/t\^^ . 

2. Maintain a linear sketch L(x) as needed for the Lp norm approximation of x. 

3. Maintain a linear sketch L' [z) as needed for the L2 norm estimation of z. 

Recovery Stage: 

1. Compute the output z* of the count-sketch and its best m-sparse approximation z. 

2. Based on L{x) compute a real r with ||a;||p < r < 2||a;||p. 

3. Based on L' [z — z) = L'{z) — L'{z) compute a real s with \\z — z\\2 < s < 2\\z — z\\2. 

4. Find i with \z*\ maximal. 

5. If s > /3mi/V or \z*\ < iT^l^r output FAIL. 

6. Output i as the sample and z*i\^^ as an approximation for Xi. 



Figure 1; Our approximate Lp-sampler with both success probability and relative error 0(e) 

Lemma 1. [6j For any x G R" and m > I we have \xi — x*\ < Err™(a;)/m"^''^ for all i £ [n] with high 
probability, where x* is the output of the count-sketch algorithm with parameter m. As a consequence we 
also have 

Err^(x) < \\x - £\\2 < WEriT{x) 

with high probability, where x is the m-sparse vector best approximating x* (i.e., Xi = xl for the m 
coordinates i with \x*\ highest and is Xi — for the remaining n — m coordinates) . 

We will also need the following result for the estimation of Lp norms. 

Lemma 2. [17] For any p £ (0, 2] there is a streaming algorithm based on a random linear map L : R" — > 
M' with I — O(logn) that outputs a value r computed solely from L{x) that satisfies \\x\\p < r < 2||2;||p 
with high probability. 

Our streaming algorithm on Figure 1 makes use of a single count-sketch and two norm estimate 
algorithms. The count-sketch is for the randomly scaled version z of the vector x. One of the norm 
approximation algorithms is for ||a;||p, the other one approximates Err2"(z) through the almost equal 
value |[2 — 5|{2. A standard L2 approximation for z works if we modify z by subtracting 2 in the recovery 
stage. One can get arbitrary good approximations of Err2'(a;) this way. 

First we estimate the probability that the algorithm aborts because s > fim^^^r. This depends on 
the scaling that resulted in z and it will be important for us that the bound holds even after conditioning 
on any one scaling factor. 

Lemma 3. Conditioned on an arbitrary fixed value t of ti for a single index i G [n] we have Pr[s > 
^m^/V \t, = i\^ Oie + n-"). 

Proof. First note that by Lemma [2] we have r > \\x\\p and s < 2\\z — z\\2 with high probability. By 
Lemma[T]we have \\z — z\\ < 10Err2"(2;) also with high probability. We may therefore assume that all of 
these inequalities hold, and in particular r > \\x\\p and s < 20Err™(2;). It is therefore enough to bound 
the probability that 20Err™(2) > I3m^^'^\\x\\p. 

For simplicity (and without loss of generality) we assume that the fixed scalar is t„ — t and will freely 
use i for indexes in [n — 1] . 

Let T — P\\x\\p. For each i G [n — 1] we define two variables z'^ and z'/ determined by Zi as follows. 
The indicator variable z[ = \ ii \zi\ > T and otherwise. We set «" = zf{l - z-)/T^ G [0,1]. Let 
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S' — 5I]ig[„_i] z'i and 5*" — 5I]ig[„_i] z" . Note that T'^S" = \\z — wHi, where w is defined by Wi = ZiZi 
for i £ [ji — 1] and uin = z„. Here w is (S' + l)-sparse, so we have Err^(2;) < TS"^^^ unless S' > m. It is 
therefore enough to bound the probabilities of the events S' > m and S" > m/3^||a;||p/(20r)^ — m/400, 
each with 0(e). 

We have E[z'i] = Ixil^/T^, E[5"] < /S'" = e^"*'. By our choice of m and the concentration of S" 
provided by fe-wise independence we have Pr[S' > m] = 0{e) as needed. The calculation for 5" is 
similar. We have 

E[z'-]< / x^'^^^T-^dt^ -^[x^l^T-". 

J\xi\P/TP 2— p 

Thus E[S"] < 2^\\x\\lT-P = 0(/3"P) = 0{e^~P). Note that the are not indicator variables as the z-, 
but they are still fc-wise independent random variables from [0, 1] and we can bound the probability of 
large deviation for 5'" as we did for S' . This completes the proof of the lemma. □ 

The fact that our algorithm is an approximate Lp-sampler with both relative error and success 
probability 0(e) follows from the following lemma. Indeed, if the probabilities were exactly e\xi\^ /r^ and 
if II^^IIp < ^ ^ 2||x||p would always hold, we would make no relative error and the success probability 
would be E[e||a;||P/r-P] > e/2f . 

Lemma 4. The probability that the algorithm of Figure 1 outputs the index i G [n] conditioned on a fixed 
value for r > \\x\\^ is [e + 0{e'^))\xi\^ /r'' + 0(n^'^). The relative error of the estimate for Xi is at most e 
with high probability. 

Proof. Optimally, we would output i G [n] if \zi\ > e^^^^r. This happens if ti < e\xi\^/r^ and has 
probability exactly e\xi\^/r^. We have to estimate the probability that something goes wrong and the 
algorithm outputs i when this simple condition is not met or vice versa. 

Three things can go wrong. First, if s > m^''^/3r the algorithm fails. This is only a problem for our 
calculation if it should, in fact, output the index i. Lemma |3] bounds the conditional probability of this 
happening. 

Having dealt with the s > fim^^'^r case we may assume now s < fim^^'^r. We also make the assump- 
tions (high probability by Lemma [2|) that \\z — z\\2 < s and thus Err™(z) < \\z — z\\2 < s < fim^^^r. 
Finally, we also assume \z* — Zi\ < Err™(z)/m^''^ < /3r for all i G [n]. This is satisfied with high 
probability by Lemma [T] 

A second source of error comes from this /3r possible difference between z* and Zi. This can only 
make a problem if is close to the threshold, namely (e'^^" + pyixil" /r^ < ti < {e'^^P - l3yP\xi\P /r" . 
The probability of selecting ti from this interval is 0{l3/e^'^^^P\xi\P /r^) = 0{e.^\xi\P /r^) as required. 

Finally, the third source of error comes from the possibility that i should be output based on \zi\ > 
e~^^Pr, yet we output another index i' ^ i because z*, > z* . In this case we must have tii < (e~^^P — 
fi)~P\xi\P /r^ . This has probability 0{e\xii /r^). By the union bound the probability that such an index 
i' exists is 0(e||x||p/r*') = 0{e). Pairwise independence is enough to conclude that the same bound holds 
after conditioning on \zi\ > e'^^^r. This finishes the proof of the first statement of the lemma. 

The algorithm only outputs an index i if s < jSm^^'^r and |2*| < e'^^^r. The first implies that 
the absolute approximation error for Zi is at most /3r, while the second lower bounds the absolute 
value of the approximation itself by e~^''^r, thus ensuring a fie^^'^ — e relative error approximation. Our 
approximation for Xi — Zit^^ is z*t^^P , so the relative error is the same. Note that the inverse polynomial 
error probability comes from the various low probability events we neglected. The same is true for the 
additive error term in the distribution. □ 

Theorem 1. For 5 > Q and e>0, 0<p<2 there is an 0{e) relative error one pass Lp-sampling 
algorithm with failing probability at most S and having low probability that the relative error of the estimate 
for the selected coordinate is more than e. The algorithm uses Op(e~ log^ n log(l/(S)) space for 

p^l while for p — 1 the space is 0(e~^ log(l/e) log^ nlog(l/5)). 

Proof. Using Lemma|4]and the fact that ||a;||p < r < 2||a;||p with high probability one obtains that the 
failure probability of the algorithm in Figure 1 is at most 1 — e/2'' + 0(n^'^). Conditioning on obtaining an 
output, returning i has probability (1 + 0(e))|a;i|''/||a;||p + 0(n^''). Clearly, the latter statement remains 
true for any number of repetitions and the failure probability is raised to power v for ii repetitions. Thus 
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using V = 0(log(l/5)/e) repetitions (taking the first non- failing output), the algorithm is an 0(e) relative 
error 5 failure probability ip-sampling algorithm. Here we assume v < n as otherwise recording the 
entire vector x is more efficient. 

The low probability of more than e relative error in estimating Xi also follows from Lemma U In one 
round, the algorithm on Figure 1 uses 0(m log n) counters for the count-sketch and this dominates the 
counters for the norm estimators. Using standard discretization this can be turned into an 0(m log^ n) 
bit algorithm. For the discretization we also have to keep our scaling factors polynomial in n. Recall that 
in the continuous model these factors t~^^^ were unbounded. But we can safely declare failure if > n'^ 
for some i G [n] as this has low probability n^~'^ . We have to do the v repetitions of the algorithm in 
parallel to obtain a single pass streaming algorithm. This increases the space to 0(i;mlog^n) which is 
the same as the one claimed in the theorem. □ 

Note that the hidden constant in the space bound of the theorem depends on p, especially that 
1/(2 — p), 1/p and 1/|1 —p\ factors come in. The last can always be replaced by a log(l/e) factor but the 
former ones are harder to handle. For p = 2 an extra logn factor seems to be necessary for an algorithm 
along these lines, see [1]. 

As we will see in Theorem|8l our space bound is tight for e and 5 constants. Note that the lower bound 
holds also if we only require the overall distribution of the Lp-sampler to be close to the Lp distribution 
as opposed to the much more strict definition of e relative error sampling. 

2.1 The Lq Sampler 

For p near zero, the method of precision sampling becomes intractable. This is because our scaling factors 
are which clearly rules out p = 0. In the following we present a Lo using a different approach. First 

we state the following well-known result on exact recovery of sparse vectors. 

Lemma 5. For 1 < s < n and k = 0(s) there is a random linear function L : R" — > R''" (generated 
from O(fclogn) random bits) and a recovery procedure that on input L{x) outputs x £ R" or DENSE, 
satisfying that for any s-sparse x the output is x' = x with probability 1, otherwise the output is DENSE 
with high probability. 

Theorem 2. There exists a zero relative error Lo sampler which uses 0(log^ nlog(l/5)) bits and outputs 
a coordinate i G [n] with probability at least 1 — 5. 

Proof. We first present our algorithm assuming a random oracle, and then we remove this assumption 
through the use of the pseudo-random generator of Nisan Let Ik for k — 1, . . . , [lognj be subsets of 
[n] of size 2*' chosen uniformly at random and Jo — [n]. For each k we run the sparse recovery procedure 
of Lemma O on the vector x restricted to the coordinates in Ik with s set to [4 log ( 1/5)] . We return a 
uniform random non-zero coordinate from the first recovery that gives a non-zero s-sparse vector. The 
algorithm fails if each recovery algorithm returns zero or DENSE. 

Let J be the set of coordinates i with Xi (the support of x). Disregarding the low probability 
error of the procedure in Lemma [S] this procedure returns each index i £ J with equal probability and 
never returns an index outside J. To bound the failure probability we observe that for \J\ < s failure is 
not possible, while for |J| > s one has k £ [[lognJ] such that E[|/fc n J|] — 2''\J\/n is between s/3 and 
2s/3. For this k alone 1 < n J| < s is satisfied with probability at least 1 — (5 by the Chernoff bound 
limiting failure probability by 5. 

To get rid of the random oracle we use Nisan's generator _25 that produces the random bits for the 
algorithm (including the ones describing Ik and the ones for the eventual random choice from Ik Cl J) 
from an 0(log^ n) length seed. It fools every logspace tester including the one that tests for a fixed set 
J C [n] and i £ [n] if the algorithm (assuming correct reconstruction) would return i. Thus this version 
of the algorithm, now using 0(log^ n) random bits and 0(log^ log(l/(5)) total space, is also a zero relative 
error Lo-sampler with failure probability bounded by 5 -I- 0{n~'^). □ 

As we will see in Theorem [S] this space bound is also tight for 5 a constant and better sampling is 
not possible even if we allow constant relative error or a small overall distance of the output from the Lq 
distribution. 
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3 Finding Duplicates 



Recall that, given a data stream of length n + 1 over the alphabet [n], finding duplicates problem asks 
to output some a G [n] that has appeared at least twice in the stream 

Theorem 3. For any S > there is a 0{log^ nlog{l/S)) space one-pass algorithm which, given a stream 
of length n + 1 over the alphabet [n], outputs an i £ [n] or FAIL, such that the probability of outputtmg 
FAIL is at most 5 and the algorithm, outputs a letter i £ [n] that is no duplicate with low probability. 

Proof. Let x be an n-dimensional vector, initially zero at each coordinate. We run the Li-sampler of 
Theorem [T] on x, with both relative error and failure probability set to 1/2. Before we start processing 
the stream, we subtract 1 from each coordinate of x; i.e., we feed the updates (i, —1) for i = 1, ... n to 
the Li sampling algorithm. When a stream item i £ [n] comes, we increase Xi by 1; i.e., we generate the 
update (i, 1). 

Observe that when the stream is exhausted, we have Xi > 1 for items i that have at least two 
occurrences in the stream, Xi — for items that appear once, and Xi — —1 for items that do not appear. 
Note that our Li-sampler, if it does not fail, outputs an index i and an approximation x* of Xi. If x* 
is positive, we output i, if it is negative or the Li-sampler fails, we output FAIL. We have X]r=i ^« ~ 
hence a perfect Li sample from x is positive with more than half probability. Taking into account that 
our Li-sampler has 1/2 relative error and failure probability (and neglecting for a second the chance 
that X* has different sign from Xi) we conclude that we output a duplicate with probability at least 1/4. 
The event that x* does not have the same sign as Xi (and thus the relative error is at least 1) has low 
probability. This low probability can increase the failure probability and/or introduce error when we 
output non-duplicate items. 

Repeating the algorithm 0(log(l/(5)) times in parallel and accepting the first non-failing output 
reduces the failure rate to 5 but keeps the error rate low. □ 

As we will see in Theorem [T] our space bound is tight for 5 < 1 a. constant. 

It is natural to study the duplicates problem for other ranges of parameters. Assume that we have a 
stream of length n — s < n over the alphabet [n]. For this problem, Gopalan et al. [14] gave an 0((s + 
1) log^ n) bits algorithm and an n(s) lower bound. Here we give an algorithm which uses 0{s logn+log"^ n) 
space. 

Theorem 4. For any S > there is an 0(s log n + log^ n log 1/5) space one-pass algorithm which, given 
a stream of length n — s over the alphabet [n], outputs NO-DUPLICATE with probability 1 if the input 
sequence has no duplicates, otherwise it outputs i G [n] or reports FAIL. The returned number is a 
duplicate with high probability while the probability of returning FAIL is at most 5. 

Proof. Let x be an n-dimensional vector updated according to the description in the proof of Theorem [Sj 
i.e., Xi is one less than the number of times i appears in the stream. In parallel, we run the exact recovery 
procedure from Lemma [5] with parameter 5s and the 1/2 relative error Li-sampler of Theorem [1] with 
failure rate 1/2, both on the vector x. If the recovery algorithm returns a vector (as opposed to DENSE) 
we proceed and give the correct output assuming we have learned the entire x. Otherwise we consider 
the output of the sampling algorithm. If it is (i,x*) with a;* > we report i as a duplicate otherwise (if 
x* < or the sampling algorithm fails) we output FAIL. Define 

i:a;^>0 i:a;^<0 

Note that ^x\i[ — \\x\\^ = X]r=i -^^ ~ W^Wi + ll^lliT — 5^) then x is 5s-sparse, thus the sparse 

recovery procedure outputs x and the algorithm makes no error. Note that the no repetition case falls 
into this category. If, however, ||a;||^ -I- \\x\\~[ > 5s, then the probability that a perfect Li sample from x is 
positive is ||a;||^/||a;||i > 2/5. Taking into account the relative error and failing probability (but ignoring 
the low probability event of the sampler outputting a wrong sign or sparse recovery algorithm reporting 
a vector), we conclude that with probability at least 1/10 we get a positive sample and a correct output, 
otherwise we output FAIL. The failure probability can be decreased to 5 by 0(log(l/(5)) independent 
repetitions of the sampler. Note that the sparse recovery does not have to be repeated as it has low error 
probability. 
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The sparse recovery procedure takes 0(s log n) bits by Lemma[5]for s > (it takes O(logn) bits for 
s = 0) and each instance of the Li-sampler requires 0(log^ n) bits by Theorem O totahng 0(s log n + 
log^n log 1/5) bits. □ 



Here we do not have a matching lower bound, but only the f2(log'^ n + s) that follows from the fl{s) 
bound in [14] and our Q{log^ n) bound on the original version of the duplicates problem. 

We remark the last two theorems can be stated in a bit more general form. Instead of considering 
repetitions in data streams one can consider the problem of finding an index i with a;^ > for a vector 
a; £ Z" given by an update stream. Let s = — X]r=i If s < 0, then a positive coordinate exists and 
the algorithm of Theorem [3] finds one using 0(log^ nlog(l/(5)) space with low error and at most 5 failure 
probability. If s > a positive coordinate does not necessarily exist, but the algorithm of Theorem |4] 
finds one, report none exists or fails with the error, failure and space bounds claimed there. 

Let us consider finally the version of the duplicates problem, where we have a stream of length 
n -\- s > n over the alphabet [n]. Our lower and upper bounds are even farther in this case. A duplicate 
can be found using 0(min{log^ n, (n/s) logn}) bits of memory in one pass with constant probability as 
follows. If we sample a random item from the stream, it will appear again unless that was the last 
appearance of the letter. As there are at most n last appearances in the stream of length n + s, the 
probability for a uniform random sample to repeat later is at least s/(n + s). Therefore, li n/s < logn, 
we can sample 4 [n/s] items from the stream uniformly at random and check if one of them appears again 
to obtain a constant error algorithm for finding duplicates. If on the other hand n/s > logn, we use the 
algorithm in Theorem [S] 

Combining our lower bound for the original version of the duplicates problem with the simple lower 
bound of n(logn), we conclude that any streaming algorithm that finds a duplicate in length n + s 
streams must use f2(log^ (n/s) + logn) bits. 

4 Lower Bounds 

All our lower bounds follow from the augmented indexing problem. This problem is defined as follows. 
Let k and m be positive integers. The first player Alice, is given a string x G [fc[™, while the second 
player Bob is given an integer i € [m] and Xj for j < i. Alice sends a single message to Bob and Bob 
should output Xi. 

Lemma 6 ([22]). In any one-way protocol m the joint random source model with success probability at 
least 1 — S > Alice must send a message of size — 5)mlogk). 

We will use this lemma by reducing augmented indexing to other communication or streaming prob- 
lems. 

4.1 Universal Relation 

Consider the following two player communication game. Alice gets a string x £ {0, 1}", Bob gets 
y € {0, 1}" with the promise that x ^ y. The players exchange messages and the last player to receive a 
message must output an index i G [n] such that Xi ^ yi. We call this the universal relation communication 
problem and denote it by UR". 

This relation has been studied in detail for deterministic communication, as it naturally arises in the 
context of Karchmer-Wigderson games [T5]. We note however that our definition is slightly unusual: in 
most settings both players must obtain the same index i such that Xi ^ yi, whereas we are satisfied with 
the last player to receive a message learning such an i. Clearly, the stronger requirement can be met in 
[logn] additional bits and one additional round. The additional bits are needed in deterministic case 
but we are not concerned with O(logn) terms for our bounds, so the two models are almost equivalent 
up to the shift of one in the number of rounds. 

The best deterministic protocol for UR" is due to Tardos and Zwick 26 . Improving a previous result 
by Karchmer [18], they gave a 3 round deterministic protocol using n + 2 bits of communication with 
both players learning the same index i and showed that n + 1 bits is necessary for such a protocol. They 
also gave an n — [log nj + 2 bit 2 round deterministic protocol for our weaker version of the problem, 
which is also tight except for the +2 term. They also gave an n — [log nJ + 4 bit 4 round protocol, where 
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both players find an index wliere x and y differ — but not necessarily tlie same index. Tliis sliows tliat 
finding the same difference is harder. 

Let Rs{U) denote the fc-round ^-error communication complexity of the communication problem U. 
We write RsiU) to denote the (5-error communication complexity when the number of rounds is not 
bounded. The following proposition follows from similar ideas that were used in Theorem [51 See the 
Appendix for a sketch of the proof. 

Proposition 5. It holds that R]{\SW) = 0(log2 nlog i) and i?|(UR") = 0(lognlog i). 

We remark that along similar lines one can find an 0(log n log log log 1/5) space two-pass zero 
relative error Lo-sampling algorithm, by estimating Lq of the vector defined by the stream in the first 
pass using [IT]- Next we will show that the above proposition is best possible up to the 0(log S"^) terms. 
We start with an averaging lemma. The proof can be found in the Appendix. 

Lemma 7. Any protocol for UR" can be turned into one that outputs every index i £ [n] with Xi ^ yi 
with the same probability. The new protocol uses a joint random source. The number of bits sent, the 
number of rounds and the error probability does not change. 

Theorem 6. For any 5 < 1 constant we have i?J(UR") — fi(log^ n) and Rs — f2(logn). 

Proof. The second bound comes from considering a uniform random pair {x, y) with Hamming distance 
1. Either player needs to get logn bits of information to learn the only index where the strings differ. 

To prove the first bound suppose Alice and Bob wants to solve the augmented indexing problem with 
Alice receiving z G [2*]° and Bob getting i £ [s] and Zj for j < i. 

Let them construct real vectors u and v as follows. Let € R be the standard unit vector in the 
direction of coordinate q. Alice forms the vectors Wj by concatenating 2''~-' copies of e^. , then she forms 
u by concatenating these vectors Wj for j G [s]. The dimension of u is n = (2" — 1)2 . Bob obtains v 
by concatenating the same vectors Wj for j £ [i ~ 1] (these are known to him) and then concatenating 
enough zeros to reach the same dimension n. 

Now Alice and Bob perform the -Rj(UR") length 5 error one round protocol for UR'^. By Lemma [7] 
we may assume the protocol returns a uniform random index where u and v differ. Note that each such 
index reveals one coordinate Zj G [2*] to Bob for j > i. As zj is revealed by 2"^-' such indices more 
than half the time when the UR" protocol does not err Bob learns the correct value of Zi. This yields a 
/?J(UR") length one way protocol for augmented indexing with error probability (1 + 5)/2. By Lemma|6] 
we have _Rj(UR") — il{st). Choosing s = t proves the theorem. □ 

4.2 Finding Duplicates 

Theorem 7. Any one-pass streaming algorithm that outputs a duplicate with constant probability uses 
n(log^n) space. This remains true even if the stream is not allowed to have an element repeated more 
than twice. 

Proof. We show our claim by a reduction from the universal relation. Each of Alice and Bob is given a 
binary string of length n, respectively x and y. Further, the players are guaranteed that x ^ y. Alice 
sends a message to Bob, after which Bob must output an index i G [n] such that Xi ^ yi. By Theorem[6l 
to solve this problem with 1/2 error probability requires f2(log^ n) bits for one-way communication. Alice 
constructs the set S — {2i — 1 + Xi \ i £ [n]} (Z [2n] and Bob constructs T — {2i — yi \ i € [n]} (Z [2n]. 
Observe that \S\ — \T\ — n and Xi ^ yi if and only if either 2i or 2i — 1 is in both S and T. 

Next, using the shared randomness, players pick a random subset P of [2n] of size n. We have 

Pr[\s n P| + |r n P| > n + 1] > 1/8. 

To see this let i G S* n T and j G [2n] \ (S n T). We have \P n {i,3}\ = 1 with probability more than 
1/2. The sets P satisfying this can be partitioned into classes of size four by putting Q U {i}, Q U {j} 
and their complements in the same class for any Q C [2n] \ {i, j'}, \Q\ = n — 1. Clearly, at least one of 
the four sets P in each class satisfies |S'nP| + |rnP|>n. 

Given a streaming algorithm A for finding duplicates, Alice feeds the elements of S n P to A and 
sends the memory contents over to Bob, along with the integer |SnP|. If ISnP] -I- |rnP| <n + l, Bob 



9 



outputs FAIL. Otherwise, feeds arbitrary n + 1 — \S H P\ elements of T P to A. Note that no element 
repeats more than twice. 

On the other hand |P| — n and we always give n + 1 elements of P to the algorithm. Also with 
constant probability, Bob finds a.n a £ S HT, which in turn reveals an i such that Xi yi- Therefore by 
Theorem [51 any algorithm for finding duplicates must use f2(log^ n) bits. □ 

4.3 Lp-sampling 

Our algorithm for the duplicates problem (Theorem [3]) is based on Li-sampling, thus the matching lower 
bound for the duplicates problem implies a similar matching bound for the sampling problem. We state 
this result here. Notice that the Lp distribution corresponding to 0, ±1 vectors are independent of p, so 
p does not have to be specified for the next theorem. 

Theorem 8. Any one pass Lp-sampler with an output distribution, whose variation distance from the 
Lp distribution corresponding to x is at most 1/3, requires Q{log^ n) bits of memory. This holds even 
when all the coodinates of x are guaranteed to be —1, or 1. 

For constants 5 < 1 and e < 1 the same lower bound holds for any e relative error Lp-sampler with 
failure probability S. 

Proof. Consider the Li sampling algorithm that we used to prove Theorem [S] Given a stream of items 
from [n] we turned it to an update stream for an n dimensional vector x by first producing an update 
(i, —1) for all i G [n] and then for any letter i in the stream producing an update (i, 1). Assuming that no 
item appears more than twice in the stream all coordinates of the final vector x are —1, or 1. The Li 
distribution for x puts weight more than 1/2 on the coordinates having value 1. These are the duplicates. 
Thus if we have another algorithm such that the variation distance of its output is at most 1/3 from this 
Li distribution, then it returns a coordinate with value 1 with probability at least 1/6. For an e relative 
error S failure probability approximate Lp-sampler the same probability is at least (1 — e)(l ~ S) — n^~'^ . 
Finding a coordinate in x with value 1 is the same as finding a duplicate in the original stream, so we 
need 57(log^ n) memory by Theorem [T] □ 

4.4 Heavy Hitters 

The heavy hitters problem in the streaming model is defined as follows. Let x be an n-dimensional integer 
vector given by an update stream. A heavy hitters algorithm with parameters p > and > is required 
to output a set S C [n] that contains all i with \xi\ > <?I>||a;||p and no i such that \xi\ < ^||2;||p. We call 
such S a valid heavy hitter set0 In this part, we show a tight lower bound for the space complexity of 
randomized algorithms (assuming constant probability of error) for the heavy hitter problem. First we 
briefly review the upper bounds. 

The count-median algorithm from [HI gives a 0{(f)~^ log^ n) space upper bound for the case of p = 1. 
Here we note the count-sketch |6] in fact gives a O(0-Plog2 n) space upper bound for all p G (0, 2]. The 
case of p = 2 easily follows from Lemma[T] Let d = 'Erv^ {x) /m^^'^ . In general it holds d < \ \x\\p/m^^^ for 
anyp G (0,2]. Indeed, let C [n] be the set of indices for which = ^^^^ aj^/m and let c = max^gj^ |a;i|. 
Then we have M^/m = E.e[n] l^A^/m > -h J2reH VA" 1^ > + c^'" E,6h = + c^^'d" > 

c^((l -p/2) -h {p/2)c''^d^ > cfic'^d^y/^ = d^. Therefore setting m = 1/(I>p in the count-sketch scheme 
gives the desired result. 

We remark that a similar upper bound for the heavy hitter problem is shown in T6 (cf. Theorem 
1), albeit via different arguments. In the next theorem, we show that the above upper bound is tight for 
any reasonable range of parameters. Our lower bound holds even in the strict turnstile model and even 
for very short streams. 

Theorem 9. Let p > and <j) € (0, 1) be a reals. Any one pass heavy hitter algorithm in the strict 
turnstile model uses \og^ n) . 

Proof. Suppose there is a one pass heavy hitter algorithm for parameters p and 0. We allow for a 
random oracle and assume the updates are polynomially bounded in n and integers. We can also restrict 

^ In general, tlie parameter i0 can be replaced by any e < </>. Since here our focus is on lower bounds, we have 
simplified the definition. 
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the number of updates to be O log n) and assume all coordinates of the final vector are positive 
(strict turnstile model). We turn this streaming algorithm into a protocol for augmented indexing in a 
similar way as we transformed the protocol for UR"^ to a protocol for augmented indexing in the proof of 
Theorem[6] The exponential growth is now achieved not by repetition but by multiplying the coordinates 
with a growing factor. 

Suppose Alice and Bob wants to solve the augmented indexing problem and Alice receives y G [2*]" 
and Bob gets i € [s] and yj for j < i. Let them construct real vectors u and v as follows. Let 
& = (1 — {2(l>)^)~^^^ and let e, € M."^ be the standard unit vector in the direction of coordinate q. Alice 
obtains u by concatenating the vectors ley^ for j G [s]. The dimension of u is n' = s2*. Bob 

obtains v by concatenating the same vectors for j £ [i~ 1] and then concatenating enough zeros, namely 
(s — i + 1)2* , to reach the same dimension n' . Now Alice and Bob perform the heavy hitter algorithm for 
the vector x = u — v as follows. Alice generates the necessary updates to increase the initially zero vector 
X G Z" to reach x = u, maintains the memory content throughout these updates and sends the final 
content to Bob. Now Bob generates the necessary updates to decrease a; = u to its final value x — u — v 
and maintains the memory throughout. Finally Bob learns the heavy hitter set S the streaming algorithm 
produces and outputs z G [2*] if the smallest index in S is (i — 1)2* + z. 

We claim that the above protocol errs only if the streaming algorithm makes an error. Notice that 
all coordinates of a;; of x — u — v are zero except the ones of the form xi . — for Ij = {j — 1)2* + yj, 

where i < j < s. Thus xi. is the first non-zero coordinate. So the claim is true if xi. > (j)\\x\\p. Using 
\v~\ < 2v for D > 1 we get exactly this: 

< {2(t)yb'''-''-'+'''>/{b'' - 1) 

^b^^"-'^ (since 6'' = 1/(1 - (20)*')) 

Let us now choose s — \{2(j>)~^ log n] and t = [log n/2] . For large enough n this gives n' = s2* < n and 
all coordinates of x throughout the procedure remain under n. Still if the streaming algorithm works with 
probability over 1/2, then by Lemma[6]the message size of the devised protocol is Q,(st) = Q{cj>~'' log^ n). 
This proves the theorem as the message size of the protocol is the same as the memory size of the 
streaming algorithm. □ 



References 

[1] Alex Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sam- 
pling. Manuscript, 2010. 

[2] Khanh Do Ba, Piotr Indyk, Eric Price, and David P. Woodruff. Lower bounds for sparse recovery. 
In SODA, pages 1190-1197, 2010. 

[3] Brian Babcock, Mayur Datar, and Rajeev Motwani. Sampling from a moving window over streaming 
data. In SODA, pages 633-634, 2002. 

[4] Radu Berinde, Graham Cormode, Piotr Indyk, and Martin J. Strauss. Space-optimal heavy hitters 
with strong error bounds. In PODS, pages 157-166, 2009. 

[5] Vladimir Braverman, Rafail Ostrovsky, and Carlo Zaniolo. Optimal sampling from sliding windows. 
In PODS, pages 147-156, 2009. 

[6] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. 
Theor. Comput. Sci., 312(1):3-15, 2004. 

[7] Edith Cohen, Nick G. Duffield, Haim Kaplan, Carsten Lund, and Mikkel Thorup. Stream sampling 
for variance-optimal estimation of subset sums. In SODA, pages 1255-1264, 2009. 

[8] Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch 
and its applications. J. Algorithms, 55(l):58-75, 2005. 



11 



[9] Graham Cormodc, S. Muthukrishnan, and Iriria Rozcnbaum. Summarizing and mining inverse 
distributions on data streams via dynamic inverse sampling. In VLDB, pages 25 36, 2005. 

[10] Graham Cormode, S. Muthukrishnan, Ke Yi, and Qin Zhang. Optimal sampling from distributed 
streams. In PODS, pages 77-86, 2010. 

[11] Nick G. Dufiield, Carstcn Lund, and Mikkel Thorup. Priority sampling for estimation of arbitrary 
subset sums. J. ACM, 54(6), 2007. 

[12] Gereon Frahling, Piotr Indyk, and Christian Sohler. Sampling in dynamic data streams and appli- 
cations. In Proceedings of the twenty-first annual symposium on Computational geometry, SCG '05, 
pages 142-149, New York, NY, USA, 2005. ACM. 

[13] Anna Gilbert and Piotr Indyk. Sparse recovery using sparse matrices. In Proceeding of IEEE, 2010. 

[14] Parikshit Gopalan and Jaikumar Radhakrishnan. Finding duplicates in a data stream. In Proceedings 

of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '09, pages 402-411, 
Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics. 

[15] T. S. Jayram and David P. Woodruff. The data stream space complexity of cascaded norms. In 

FOCS, pages 765 774, 2009. 

[16] Daniel M. Kane, Jelani Nelson, Ely Porat, and Woodruff David P. Fast moment estimation in data 
streams in optimal space. Manuscript, 2010. 

[17] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct 
elements problem. In Proceedings of the twenty-ninth ACM SICMOD-SIGACT-SIGART symposium 
on Principles of database systems of data, PODS '10, pages 41-52, New York, NY, USA, 2010. ACM. 

[18] Mauricio Karchmer. A New Approach to Circuit Depth. PhD thesis, MIT, 1989. 

[19] Mauricio Karchmer and Avi Wigderson. Monotone circuits for connectivity require super-logarithmic 
depth. In Proceedings of the twentieth annual ACM symposium on Theory of computing, STOC '88, 
pages 539-550, New York, NY, USA, 1988. ACM. 

[20] Donald E. Knuth. The Art of Computer Programming, Volume II: Seminumerical Algorithms. 

Addison- Wesley, 1969. 

[21] Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Duplicate detection in click streams. In 
WWW, pages 12-21, 2005. 

[22] Peter Bro Miltcrscn, Noam Nisan, Slinmcl Safra, and Avi Wigderson. On data structures and asym- 
metric communication complexity. In Proceedings of the twenty-seventh annual A CM symposium on 
Theory of computing, STOC '95, pages 103-111, New York, NY, USA, 1995. ACM. 

[23] Morteza Monemizadeh and David P. Woodruff. 1-pass relative-error Ip-sampling with applications. 

In SODA, pages 1143-1160, 2010. 

[24] S. Muthukrishnan. Data Streams: Algorithms and Applications. 

[25] N. Nisan. Pseudorandom generators for space-bounded computations. In Proceedings of the twenty- 
second annual ACM symposium on Theory of computing, STOC '90, pages 204-212, New York, NY, 
USA, 1990. ACM. 

[26] Gabor Tardos and Uri Zwick. The communication complexity of the universal relation. In Proceedings 
of the 12th Annual IEEE Conference on Computational Complexity, pages 247-, Washington, DC, 
USA, 1997. IEEE Computer Society. 

[27] Jun Tarui. Finding a duplicate and a missing item in a stream. In Jin-Yi Cai, S. Cooper, and Hong 
Zhu, editors, Theory and Applications of Models of Computation, volume 4484 of Lecture Notes in 
Computer Science, pages 128-135. Springer Berlin / Heidelberg, 2007. 

[28] David Woodruff and T. S. Jayram. Optimal bounds for johnson-lindenstrauss transforms and stream- 
ing problems with low error. In SODA, 2011. 



12 



A Appendix 



A.l Missing proofs 

Proof of Proposition^^ (sketch) One way to deduce the one round protocol is from Theorem [J] AUce 
and Bob run a single pass Lo-sampling algorithm on a:: — y. This can be achieved by a single message 
from Alice to Bob containing the memory after the first set of updates as in the proof of Theorem]^ The 
sample i Bob finds is an (almost uniform random) index with Xi ^ yi. 

Looking more closely to this algorithm we have presented, it finds an index where x and y disagree 
from some set I C [n] that contains at least one, but not too many such indices. It tries 0(log n) random 
sets so that one of them works. One can obtain the two round protocol by finding such a set in the first 
round and concentrating on a single such set in the second round. □ 

Proof of Lemma^ Using the joint random source the players take a uniform random permutation vr of 
[n] and use it to permute the digits of x and y. Further they take a uniform random subset S C [n] 
and flip the digits with coordinates in S. This requires no communication. Then they run the original 
protocol on the modified inputs and report n~^{i) if the original protocol reports i. All indices where x 
and y differ are reported with equal probability by symmetry. □ 
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